This is a summary to help you reproduce our empirical results presented in the paper: "Forecasting the Path of US CO2 Emissions Using State-Level Information".

I, Ralf Steinhauser, authored this readme files so please send any inquiries to me at ralf.steinhauser@anu.edu.au

I recommend using a text editor that can wrap words, as this document is produced with such.


The empirical analysis and graphs where done using Matlab (ver. 7.5.0 (R2007b)). 

Overview: We use four distinct m-files which do the following: 
		
		Firstly the two files POP_GDPinterpolation80new2001.m and POP_GDPinterpolation2001onward.m use a piecewise cubic interpolation to get missing population projections and generate income projections assuming bivariate normal distribution as described in section 3.5.
		
		Then there is the main and largest m-file (reg25_inclAgg_doingNonIterativeUncon.m), which is used to generate and run regressions on all models in our model universe (incl. the benchmarks), to generate forecasts given the regression parameters, and to calculate the MSFE for each model. At the end it also runs the Bootstrap Reality Check Test with the benchmark models from the literature.
		
		Finally there is the Bootstrap_in_sampleCriteria.m file which runs the bootstrap tests for the in-sample selected models as benchmarks and gives out the MSFE as they appear in tables 3 and 4 in the paper.    
		
		
Other files that are called out within the above are the bsds_ralf_ess.m file which in turn calls the two files; block_bootstrap.m and stationary_bootstrap.m. As you can see, the file itself was written by Kevin Sheppard with some alterations by myself correcting some errors to make it work for our data.


Now to each m-file and its inputs/outputs in more detail:

	I will now explain the main regression file reg25_inclAgg_doingNonIterativeUncon.m starting with its inputs:
		The code uses data from three different DAT-files (input1960-2001all.mat, forecast80_2001.mat and forecast01Fold.mat) which I will explain now:
			
			input1960-2001all.mat contains the set of all non-projected variables as state-year observations for the years 1960 through 2001 (that is 42 years x 50 states for 2100 observations. 
			 The merged data includes the following series with the relevent source in brackets:
				
				- area - State area (from Department of Commerce, Bureau of the Census. See also http://www.infoplease.com/ipa/A0108355.html#axzz0wvcbETbE)	
				
				- HDD/CDD  - Heating and Cooling Degree Days (from U.S. Climate Normals from the Environmental Information Summaries C-23. See also http://www.ncdc.noaa.gov/oa/documentlibrary/pdf/eis/c23a.pdf)
				
				- co2/co2pc - Total Carbon Emissions and Per Capita Carbon Emissions (from Blasing, T.J., C.T. Broniak, and G. Marland (2004) Trends: A Compendium of Data on Global Change (Oak Ridge National Laboratory, U.S. Department of Energy, Oak Ridge, TN, U.S.A.: Carbon Dioxide Information Analysis Center). See also http://cdiac.ornl.gov/ftp/trends/emis_mon/stateemis/percapbystate.csv)
				
				- pop/popdens - Population and Population Density (The population data were also taken from Blasing et al. (2004), who in turn took them from EIA 2003a and the U.S. Census Bureau. The popdens series is generated from pop/area. Link to data source same as for co2 above)
				
				-income/incompc - Inflated to 2001 US$ Personal Income (from Bureau of Economic Analysis)
				
				- crisis - A dummy variable for each of the economic downturns, being 1 instead of 0 in the respective years (1973-75,1979-81,1990-91), see table 1
				
				- coal/coastal/oil - State specific dummies which are 1 if the state produces coal/coastal/oil respectively (these are also combined as one variable called dummy in the DAT file)
				
				- stateID/timeID/year - Other variables describing each observation 
			
			
			
			The second input file used is forecast80_2001.mat which contains the population and income projections done in the �80s for periods in the �90s for use in the out-of-sample experiments.
				This data file is generated with the POP_GDPinterpolation80new2001.m file as described below and in the paper in section 3.5, using Census Bureau population projections by Wetrogan (1983). See also original publication and included Excel files below).
				
			
			Last but not least the forecast01Fold.mat file used contains the equivalent population and income projections for the years past 2001. 
				This data file is generated with the POP_GDPinterpolation2001onward.m file as described below and in the paper in section 3.5, using Census Bureau Population Division projections.  The State Interim Population Projections are published on http://www.census.gov/population/www/projections/projectionsagesex.html


		With these three input files we can run the longish code. The code includes comments throughout to help you see what I am doing at each step. In short, what the code does is to run through each of our 27000 models and the benchmarks performing the out-of-sample experiment. That is, for each model it cuts off some of the data, runs a regression, uses those coefficients to make a prediction (in our case the forecast for the years 1992, 1993,..., 2001) and records the difference to the actual emissions as the MSFE. It also runs regressions for the entire data 1960-2001 and predicts 10 years out from here. What we end up with are the 'f_hat' ,'f_hatA', and 'f_bench' named according to appendix A. That is 10 MSFE errors for each model for the out-of-sample experiment. We also get each model's in sample selection criteria and forecast beyond the year 2001 for each model. 
		At the end of the code we run the Reality Check Bootstrap Test with the different Benchmark models as test subjects and the code also gives out the specification of the best models according to the per capita and the aggregate MSFE errors. You will end up with a workspace of about 200MB file size. 
		
		
	Then there is the POP_GDPinterpolation80new2001.m generating forecast80_2001.mat which as discussed above generates the population and income forecast series for the �90s. 
			POP_GDPinterpolation80new2001.m  uses the included Excel files with population data from 1980, 1990 and 2000 based on the Census Bureau population projections by Wetrogan (1983) as inputs (pop1980.xls, pop1990.xls, pop2000.xls). It also uses income and population data from before 1980 to determine the parameters of the bivariate normal distribution between the two series. So it reads in the main data source file input1960-2001all.mat as well.
			We use a piecewise cubic interpolation to get the missing years of the population series and generate a predicted income series as discussed in section 3.5.
			The outputs are the variables as we need them in the forecast80_2001.mat as the input for the main m-file to run our true out-of-sample indirect prediction experiments.
			
	Then there is the POP_GDPinterpolation2001onward.m generating forecast01Fold.mat which as discussed above generates the population and income forecast series  past 2001.
			POP_GDPinterpolation2001onward.m  uses as inputs the included Excel file with population projections for 2004-2030 produced by Census Bureau Population Division projections: The State Interim Population Projections (published on http://www.census.gov/population/www/projections/projectionsagesex.html) ('pop2004_2030.xls'). It also uses income and population data from before 2001 to determine the parameters of the bivariate normal distribution between the two series. So it reads in the main data source file, input1960-2001all.mat, as well. It also imports again the state area file, area.xls, from the Department of Commerce, Bureau of the Census (the latter is not necessary as the area data is included in the overall input file).
			We use a piecewise cubic interpolation to get the missing years of the population series (2002-2003) and generate an income series as discussed in section 3.5.
			The outputs are the variables as we need them in  forecast01Fold.mat as the input for the main m-file to run our true out-of-sample indirect prediction experiments.	
			

	Finally running the Bootstrap_in_sampleCriteria.m will give you the rest of the numbers that go into tables 3 and 4. As explained at the top of the file, you need only variables from the main run as inputs here. These are in particular: 'AIC','R2adj','R_2','SIC' 'f_hat' ,'f_hatA', 'block','Number' and 'NumberA'. The easiest way to read these all in properly would be to run it straight after the main run while the data is still in the cache, or you could read them in using the saved reg_25_bootvar variables, but you might need some other small variables. 
	

This concludes all the matlab program m-file descriptions. I hope this description will help with replicating our results easily, or for you to come up with new uses for this methodological approach. Feel free to contact me at ralf.steinhauser@anu.edu.au with comments or questions. 


The Data used to make the comparison to other U.S. Carbon Forecasts Emission Scenario Database developed for the IPCC�s Special Report on Emissions
Scenarios, is included in its extracted form including the relevant US predictions. The full data from the Center for Global Environmental
Research at the National Institute for Environmental Studies and 
can be found on the following website: http://www-cger.nies.go.jp/scenario/index.html


Best Regards, 
Ralf
